14 分布式架构工程设计

一些分布式架构设计原则：

Designs, Lessons and Advice from Building Large Distributed Systems / Building Software Systems At Google and Lessons Learned 回顾了 Google 发展的历史
The Twelve-Factor App 12-Factor 为构建 SaaS 应用提供了方法论
Notes on Distributed Systems for Young Bloods 给准备进入分布式系统领域的人的一些忠告
On Designing and Deploying Internet-Scale Services 微软 Windows Live 服务平台的一些经验性的总结文章
4 Things to Keep in Mind When Building a Platform for the Enterprise 阐述了为企业构建平台时需要牢记的四件关于软件设计方面的事
Principles of Chaos Engineering Netflix 分享了一些软件架构的经验和原则
Building Fast & Resilient Web Applications 关于如何通过弹力设计来实现快速和可容错的网站架构的演讲
Design for Resiliency 全面认识“弹力（Resiliency）”，以及弹力对于系统的重要性，并详细阐述了如何设计和实现系统的弹力
Design Principle 微软 Azure 网站上的系列文章
Eventually Consistent 关于最终一致性的文章
Writing Code that Scales 告诉我们一些很不错的写出高扩展和高性能代码的工程原则
Automate and Abstract: Lessons from Facebook on Engineering for Scale 软件自动化和软件抽象方面的一些经验教训

设计模式

微软云平台 Azure 上的设计模式
- Cloud Design Patterns
- 设计模式：可用性；
- 设计模式：数据管理；
- 设计模式：设计和实现；
- 设计模式：消息；
- 设计模式：管理和监控；
- 设计模式：性能和扩展；
- 设计模式：系统弹力；
- 设计模式：安全。
AWS Cloud Pattern AWS 云平台的一些设计模式
Design patterns for container-based distributed systems 描述了容器化下的分布式架构的设计模式
Patterns for distributed systems 讲了一些分布式系统的架构模式
A Pattern Language for Micro-Services
SOA Patterns

设计与工程实践

分布式系统的故障测试

FIT: Failure Injection Testing Netflix 公司的一篇关于做故障注入测试的文章
Automated Failure Testing Netflix 公司的自动化故障测试的一篇博文
Automating Failure Testing Research at Internet Scale Netflix 公司伙同圣克鲁斯加利福尼亚大学和 Gremlin 游戏公司一同撰写的一篇论文

弹性伸缩

4 Architecture Issues When Scaling Web Applications: Bottlenecks, Database, CPU, IO 讲解后端程序的主要性能指标，即响应时间和可伸缩性这两者如何能提高的解决方案
Scaling Stateful Objects 讨论了有状态和无状态的节点如何伸缩的问题
Scale Up vs Scale Out: Hidden Costs 详细分析了可伸缩性架构的不同扩展方案（横向扩展或纵向扩展）所带来的成本差异，帮助更好地选择合理的扩展方案
Best Practices for Scaling Out 讨论 Scale out 最佳实践的文章
Scalability Worst Practices 讨论了一些最差实践
Reddit: Lessons Learned From Mistakes Made Scaling To 1 Billion Pageviews A Month 一些关于系统扩展的经验教训
关于自动化弹性伸缩的文章：
- Autoscaling Pinterest；
- Square: Autoscaling Based on Request Queuing；
- PayPal: Autoscaling Applications；
- Trivago: Your Definite Guide For Autoscaling Jenkins；
- Scryer: Netflix’s Predictive Auto Scaling Engine。

一致性哈希

Consistent Hashing 一致性哈希的简单教程，有代码示例
Consistent Hashing: Algorithmic Tradeoffs 讲述了一致性哈希的一些缺陷和坑，以及各种哈希算法的性能比较
Distributing Content to Open Connect Netflix 的一个对一致性哈希的实践
Consistent Hashing in Cassandra Cassandra 中使用到的一致性哈希的相关设计

数据库分布式

Life Beyond Distributed Transactions 讨论一种基于本地事务情况下的事务机制，它基于实体和活动（Activity）的概念
How Sharding Works 探讨数据 Sharding 的文章
Why you don’t want to shard 不到万不得已不要做数据库分片
How to Scale Big Data Applications 关于怎样给大数据应用做架构扩展
MySQL Sharding with ProxySQL 用 ProxySQL 来支撑 MySQL 数据分片的实践

缓存

缓存更新的套路缓存更新的几个设计模式
Design Of A Modern Cache 设计一个现代化的缓存系统需要注意到的东西
Netflix: Caching for a Global Netflix Netflix 公司的全局缓存架构实践
Facebook: An analysis of Facebook photo caching Facebook 公司的图片缓存使用分析
How trivago Reduced Memcached Memory Usage by 50% Trivago 公司一篇分享自己是如何把 Memcached 的内存使用率降了一半的实践性文章
Caching Internal Service Calls at Yelp Yelp 公司的缓存系统架构

消息队列

Understanding When to use RabbitMQ or Apache Kafka 什么时候使用 RabbitMQ，什么时候使用 Kafka
Trello: Why We Chose Kafka For The Trello Socket Architecture Trello 的 Kafka 架构分享
LinkedIn: Running Kafka At Scale LinkedIn 公司的 Kafka 架构扩展实践
Should You Put Several Event Types in the Same Kafka Topic?
Billions of Messages a Day - Yelp’s Real-time Data Pipeline Yelp 公司每天十亿级实时消息的架构
Uber: Building Reliable Reprocessing and Dead Letter Queues with Kafka Uber 公司的 Kafka 应用
Uber: Introducing Chaperone: How Uber Engineering Audits Kafka End-to-End Uber 公司对 Kafka 消息的端到端审计
Publishing with Apache Kafka at The New York Times 纽约时报的 Kafka 工程实践
Kafka Streams on Heroku ，Heroku 公司的 Kafka Streams 实践
Salesforce: How Apache Kafka Inspired Our Platform Events Architecture ，Salesforce 的 Kafka 工程实践
Exactly-once Semantics are Possible: Here’s How Kafka Does it ，怎样用 Kafka 让只发送一次的语义变为可能
Delivering billions of messages exactly once 一篇挑战消息只发送一次这个技术难题的文章
Benchmarking Streaming Computation Engines at Yahoo! Yahoo! 的 Storm 团队在为他们的流式计算做技术选型时，发现市面上缺乏针对不同计算平台的性能基准测试，于是研究并设计了一种方案来做基准测试

关于日志方面

Using Logs to Build a Solid Data Infrastructure - Martin Kleppmann ，设计基于 log 结构应用架构
Building DistributedLog: High-performance replicated log service ，讲述了高性能日志系统的一些技术细节
LogDevice: a distributed data store for logs ，Facebook 分布式日志系统方面的一些工程分享

关于性能方面

Understand Latency ，一些和系统响应时间相关的文章
Common Bottlenecks ，讲述了 20 个常见的系统瓶颈
Performance is a Feature ，关注性能
Make Performance Part of Your Workflow ，一些和性能有关的设计上的平衡和美学
CloudFlare: How we built rate limiting capable of scaling to millions of domains，讲述了 CloudFlare 公司是怎样实现他们的限流功能的

关于搜索方面

Instagram: Search Architecture
eBay: The Architecture of eBay Search
eBay: Improving Search Engine Efficiency by over 25%
LinkedIn: Introducing LinkedIn’s new search architecture
LinkedIn: Search Federation Architecture at LinkedIn
Slack: Search at Slack
DoorDash: Search and Recommendations at DoorDash
Twitter: Search Service at Twitter (2014)
Pinterest: Manas: High Performing Customized Search System
Sherlock: Near Real Time Search Indexing at Flipkart
Airbnb: Nebula: Storage Platform to Build Search Backends

各公司的架构实践

High Scalability 网站会定期分享一些大规模系统架构是怎样构建的

YouTube Architecture
Scaling Pinterest
Google Architecture
Scaling Twitter
The WhatsApp Architecture
Flickr Architecture
Amazon Architecture
Stack Overflow Architecture
Pinterest Architecture
Tumblr Architecture
Instagram Architecture
TripAdvisor Architecture
Scaling Mailbox
Salesforce Architecture
ESPN Architecture
Uber Architecture
Dropbox Design
Splunk Architecture

设计模式​

设计与工程实践​

分布式系统的故障测试​

弹性伸缩​

一致性哈希​

数据库分布式​

缓存​

消息队列​

关于日志方面​

关于性能方面​

关于搜索方面​

各公司的架构实践​

设计模式

设计与工程实践

分布式系统的故障测试

弹性伸缩

一致性哈希

数据库分布式

缓存

消息队列

关于日志方面

关于性能方面

关于搜索方面

各公司的架构实践